223 research outputs found

    Identity and Granularity of Events in Text

    Full text link
    In this paper we describe a method to detect event descrip- tions in different news articles and to model the semantics of events and their components using RDF representations. We compare these descriptions to solve a cross-document event coreference task. Our com- ponent approach to event semantics defines identity and granularity of events at different levels. It performs close to state-of-the-art approaches on the cross-document event coreference task, while outperforming other works when assuming similar quality of event detection. We demonstrate how granularity and identity are interconnected and we discuss how se- mantic anomaly could be used to define differences between coreference, subevent and topical relations.Comment: Invited keynote speech by Piek Vossen at Cicling 201

    MedRoBERTa.nl: A Language Model for Dutch Electronic Health Records

    Get PDF
    This paper presents MedRoBERTa.nl as the first Transformer-based language model for Dutch medical language. We show that using 13GB of text data from Dutch hospital notes, pre-training from scratch results in a better domain-specific language model than further pre-training RobBERT. When extending pre-training on RobBERT, we use a domain-specific vocabulary and re-train the embedding look-up layer. We show that MedRoBERTa.nl, the model that was trained from scratch, outperforms general language models for Dutch on a medical odd-one-out similarity task. MedRoBERTa.nl already reaches higher performance than general language models for Dutch on this task after only 10k pre-training steps. When fine-tuned, MedRobERTa.nl outperforms general language models for Dutch in a task classifying sentences from Dutch hospital notes that contain information about patients' mobility levels

    The Linguistic versus Cognitive Role of Classifying Nouns

    Get PDF
    Semantic classifications play a role in both the organization of our world knowledge and our vocabulary. Lexicons with semantic information are likewise often organized as taxonomies in which specific words are related or decomposed to (a small set of) more general words. Conceptual knowledge bases are structured as networks in which redundancies are predicted by more general concepts for more specific concepts. Even though lexical knowledge and conceptual knowledge do not necessarily coincide, in practice, most databases do not make a clear distinction between these two types ofknowledge. In semantic lexicons and dictionaries it is not clear what is being described - knowledge of words or knowledge of worlds - and conceptual databases often just store information for the same words in capital letters, suggesting that a definition of CAR is automatically different from a definition of "car". Because no clear distinction is made in the role of the semantic information, it is also not clear what criteria are to be used to evaluate that information or, more specifically, what is the role of classification structures in these specifications

    ReferenceNet: a semantic-pragmatic network for capturing reference relations

    Get PDF
    In this paper, we present ReferenceNet: a semantic-pragmatic network of reference relations between synsets. Synonyms are assumed to be exchangeable in similar contexts and also word embeddings are based on sharing of local contexts represented as vectors. Co-referring words, however, tend to occur in the same topical context but in different local contexts. In addition, they may express different concepts related through topical coherence, and through author framing and perspective. In this paper, we describe how reference relations can be added to WordNet and how they can be acquired. We evaluate two methods of extracting event coreference relations using WordNet relations against a manual annotation of 38 documents within the same topical domain of gun violence. We conclude that precision is reasonable but recall is lower because the Word-Net hierarchy does not sufficiently capture the required coherence and perspective relations

    A Narratology-Based Framework for Storyline Extraction

    Get PDF
    Stories are a pervasive phenomenon of human life. They also represent a cognitive tool to understand and make sense of the world and of its happenings. In this contribution we describe a narratology-based framework for modeling stories as a combination of different data structures and to automatically extract them from news articles. We introduce a distinction among three data structures (timelines, causelines, and storylines) that capture different narratological dimensions, respectively chronological ordering, causal connections, and plot structure. We developed the Circumstantial Event Ontology (CEO) for modeling (implicit) circumstantial relations as well as explicit causal relations and create two benchmark corpora: ECB+/CEO, for causelines, and the Event Storyline Corpus (ESC), for storylines. To test our framework and the difficulty in automatically extract causelines and storylines, we develop a series of reasonable baseline system

    Finding Stories in 1,784,532 Events: Scaling Up Computational Models of Narrative

    Get PDF
    Information professionals face the challenge of making sense of an ever increasing amount of information. Storylines can provide a useful way to present relevant information because they reveal explanatory relations between events. In this position paper, we present and discuss the four main challenges that make it difficult to get to these stories and our first ideas on how to start resolving them

    Cross-linguistic differences and similarities in image descriptions

    Get PDF
    Automatic image description systems are commonly trained and evaluated on large image description datasets. Recently, researchers have started to collect such datasets for languages other than English. An unexplored question is how different these datasets are from English and, if there are any differences, what causes them to differ. This paper provides a cross-linguistic comparison of Dutch, English, and German image descriptions. We find that these descriptions are similar in many respects, but the familiarity of crowd workers with the subjects of the images has a noticeable influence on description specificity.Comment: Accepted for INLG 2017, Santiago de Compostela, Spain, 4-7 September, 2017. Camera-ready version. See the ACL anthology for full bibliographic informatio

    Talking about other people:An endless range of possibilities

    Get PDF
    Image description datasets, such as Flickr30K and MS COCO, show a high degree of variation in the ways that crowd-workers talk about the world. Although this gives us a rich and diverse collection of data to work with, it also introduces uncertainty about how the world should be described. This paper shows the extent of this uncertainty in the PEOPLE-domain. We present a taxonomy of different ways to talk about other people. This taxonomy serves as a reference point to think about how other people should be described, and can be used to classify and compute statistics about labels applied to people
    • …
    corecore